Search CORE

1,377 research outputs found

Multiple Retrieval Models and Regression Models for Prior Art Search

Author: Lopez Patrice
Romary Laurent
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Simple vs. sophisticated approaches for patent prior-art search

Author: Jones Gareth J.F.
Lopez Patrice
Magdy Walid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2011
Field of study

Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided

DCU Online Research Access Service

GROBID: combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications

Author: Lopez Patrice
Publication venue
Publication date: 01/01/2009
Field of study

Περιέχει το πλήρες κείμενοBased on state of the art machine learning techniques, GROBID (GeneRation Of BIbliographic Data) performs reliable bibliographic data extractions from scholar articles combined with multi-level term extractions. These two types of extraction present synergies and correspond to complementary descriptions of an article. This tool is viewed as a component for enhancing the existing and the future large repositories of technical and scientific publications

LEKYTHOS

Crossref

Using citation-context to reduce topic drifting on pure citation-based recommendation

Author: Bradshaw Shannon
Gipp Bela
Gipp Bela
Huang Shen
Knoth Petr
Lopez Patrice
Mikolov Tomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Recent works in the area of academic recommender systems have demonstrated the effectiveness of co-citation and citation closeness in related-document recommendations. However, documents recommended from such systems may drift away from the main theme of the query document. In this work, we investigate whether incorporating the textual information in close proximity to a citation as well as the citation position could reduce such drifting and further increase the performance of the recommender system. To investigate this, we run experiments with several recommendation methods on a newly created and now publicly available dataset containing 53 million unique citation-based records. We then conduct a user-based evaluation with domain-knowledgeable participants. Our results show that a new method based on the combination of Citation Proximity Analysis (CPA), topic modelling and word embeddings achieves more than 20% improvement in Normalised Discounted Cumulative Gain (nDCG) compared to CPA

Crossref

Open Research Online (The Open University)

Représenter et utiliser les contraintes de la langue orale à l'aide d'une grammaire lexicalisée d'arbres adjoints

Author: Lopez Patrice
Publication venue: HAL CCSD
Publication date: 01/07/1999
Field of study

Colloque avec actes et comité de lecture.Cet article souligne le problème de l'analyse grammaticale des énoncés oraux incomplets en contexte de dialogue homme-machine. Des contraintes minimales de l'oral sont cependant exploitables afin de rester prédictif face aux phénomènes d'ellipses. Nous proposons un enrichissement du formalisme LTAG afin de capter ces contraintes et d'adapter à l'oral une grammaire initialement conçue pour l'écrit

INRIA a CCSD electronic archive server

GRISP: A Massive Multilingual Terminological Database for Scientiﬁc and Technical Domains

Author: Lopez Patrice
Romary Laurent
Publication venue: HAL CCSD
Publication date: 19/05/2010
Field of study

International audienceThe development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple technical and scientiﬁc ﬁelds from various open resources. A crucial aspect is the merging of the different resources which is based in our proposal on the deﬁnition of a sound conceptual model, different domain mapping and the use of structural constraints and machine learning techniques for controlling the fusion process. The result is a massive terminological database of several millions terms, concepts, semantic relations and deﬁnitions. This resource has allowed us to improve signiﬁcantly the mean average precision of an information retrieval system applied to a large collection of multilingual and multidomain patent documents

INRIA a CCSD electronic archive server

A Framework for Multi-level Linguistic Annotation

Author: Lopez Patrice
Romary Laurent
Publication venue: HAL CCSD
Publication date: 30/05/2000
Field of study

International audienceThis article presents a 3-step model for multi- layer annotations of corpora. Each kind of an- notation for a textual corporacorresponds to a dierent view on the same document. This prin- ciple can be expressed rst with a general re- lational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the en- coding of large corpora. The exploitation of this kind of annotated corpora requires ecient ma- nipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propo- sitions have been implemented in the rst ver- sion of a workbench dedicated to the French Le Monde corpus

INRIA a CCSD electronic archive server

HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID

Author: Lopez Patrice
Romary Laurent
Publication venue: HAL CCSD
Publication date: 15/07/2010
Field of study

International audienceThe Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientiﬁc documents. The tool ﬁrst uses GROBID's facilities for analyzing the structure of sci- entiﬁc articles, resulting in a ﬁrst set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efﬁcient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository

INRIA a CCSD electronic archive server

Contribution à l'analyse robuste non déterministe pour les systèmes de dialogue parlé

Author: Lopez Patrice
Roussel David
Publication venue: HAL CCSD
Publication date: 01/07/1999
Field of study

Colloque avec actes et comité de lecture.Nous présentons une technique d'analyse robuste dans le but de relayer la décision d'un système de reconnaissance de la parole. La stratégie d'analyse proposée est fondée sur une grammaire d'arbres adjoints lexicalisée compactée et sur la mise en concurrence des différentes hypothèses du système de reconnaissance de la parole. Les problèmes de robustesse sont étudiés en considérant les interférences entre erreurs de reconnaissance de la parole et phénomènes de parole spontanée dans les dialogues homme-machine

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Automates synchronisés pour l'intégration de techniques de reconnaissances de la parole et de compréhension du langage naturel

Author: Husson Jean-Luc
Lopez Patrice
Publication venue: HAL CCSD
Publication date: 01/07/1999
Field of study

Colloque avec actes et comité de lecture.Nous présentons une architecture dont l'objectif est d'aboutir à une intégration forte des différents niveaux de traitement de la langue parlée. Les connaissances statiques sont représentées sous forme d'automates à états finis permettant un partage optimal des sous-structures communes. Ces automates sont utilisés dans la mise en oeuvre d'analyses stochastiques et tabulaires afin de prendre en compte le non-déterminisme des différents niveaux de traitement. Des fonctions de synchronisation sont appliquées sur ces automates afin de propager les différentes contraintes entre niveaux. l'architecture obtenue permet d'isoler compétences symboliques et probabilistes, interfaces entre niveaux d'analyse et contrôles. Des expérimentations reposant sur ces principes et ces représentations sont actuellement menées en vue d'intégrer un système de segmentation analytique, un module de reconnaissance phonétique stochastique et un analyseur basé sur les LTAG synchrones

INRIA a CCSD electronic archive server